fix: definition coverage gaps in words hub and 5 small languages by Hugo0 · Pull Request #135 · Hugo0/wordle

Hugo0 · 2026-02-26T14:44:23Z

Summary

Words hub + word pages (/<lang>/words, /<lang>/word/<id>) only read definitions from the disk cache, completely missing the 206K pre-built kaikki definitions that are already loaded in memory. This caused most words on the words hub to show no definition — e.g. Spanish showed definitions for only 3 out of 7,405 words.
5 small/constructed languages (Palauan, Interlingue, Kinyarwanda, Klingon, Quenya) had zero definition coverage — they fall through all 4 tiers (no kaikki files, UNRELIABLE parser confidence, not in LLM allowlist).

Changes

app.py: Both language_words_hub() and word_page() now fall back to lookup_kaikki_native() → lookup_kaikki_english() on disk cache miss. These are in-memory dict lookups (zero latency, no network calls).
wiktionary.py: Added pau, ie, rw, tlh, qya to LLM_LANG_NAMES so they can use the gpt-4o-mini fallback.

Test plan

All 2,250 tests pass, 198 parser tests pass
Visit /es/words — definitions should now appear for all words with kaikki entries
Visit /pau/words — LLM definitions should be fetchable via individual word pages
Visit /en/word/<id> — definition should server-render without needing JS fallback fetch

Summary by CodeRabbit

New Features
- Enhanced definition retrieval across word hub and word pages with automatic fallback when cached definitions are unavailable.
- Added support for 5 new languages: Palauan, Interlingue, Kinyarwanda, Klingon, and Quenya.

The words hub (/<lang>/words) and individual word pages (/<lang>/word/<id>) only read from the disk cache for definitions, missing the 206K pre-built kaikki definitions that exist in memory. This caused most words on the words hub to show no definition (e.g. Spanish had 3/7405 cached). Now both routes fall back to kaikki native → kaikki English lookups on cache miss. These are in-memory dict lookups with zero latency. Also adds 5 zero-coverage languages (pau, ie, rw, tlh, qya) to the LLM fallback allowlist — they had no kaikki files, UNRELIABLE parser confidence, AND were missing from the LLM allowlist, so every definition request returned None.

Hugo0 · 2026-02-26T14:44:29Z

@coderabbitai review

coderabbitai · 2026-02-26T14:44:41Z

📝 Walkthrough

Walkthrough

The changes implement a disk-cache-first fallback strategy for definition lookups across multiple functions in the application, importing new Wiktionary lookup functions for native and English language definitions. Additionally, five new language codes are added to the language names mapping to support Palauan, Interlingue, Kinyarwanda, Klingon, and Quenya.

Changes

Cohort / File(s)	Summary
Definition Loading Logic `webapp/app.py`	Imports `lookup_kaikki_native` and `lookup_kaikki_english`, then updates definition-loading fallback logic in `language_words_hub` (2 locations) and `word_page` to prioritize disk-cached definitions and fall back to native, then English kaikki lookups if needed.
Language Support Expansion `webapp/wiktionary.py`	Adds five new language code mappings to `LLM_LANG_NAMES`: pau (Palauan), ie (Interlingue), rw (Kinyarwanda), tlh (Klingon), and qya (Quenya).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A rabbit hops through cached halls,
When definitions fail to call,
We seek the kaikki way—
Native first, then English's say!
Five new tongues now join the spree, 🌍
From Klingon to Palauan, we're free!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main changes: fixing definition coverage gaps by adding fallback logic to the words hub and expanding language support to 5 small languages.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/definition-coverage-gaps

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2026-02-26T14:44:41Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

webapp/app.py (1)
1418-1431: ⚠️ Potential issue | 🟡 Minor

Inconsistent handling of negative cache entries compared to language_words_hub.

In language_words_hub (line 1174-1175), the code checks loaded and loaded.get("definition") before assigning, which correctly skips negative cache entries like {"not_found": True, "ts": ...}.

Here in word_page, the assignment definition = loaded if loaded else None would treat a negative cache entry as a valid definition (since loaded is truthy), causing the kaikki fallback to be skipped.
Proposed fix to align with language_words_hub logic
     if os.path.exists(cache_path):
         try:
             with open(cache_path, "r") as f:
                 loaded = json.load(f)
-                definition = loaded if loaded else None
+                if loaded and loaded.get("definition"):
+                    definition = loaded
         except Exception:
             pass
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@webapp/app.py` around lines 1418 - 1431, The code in word_page incorrectly
treats any loaded cache JSON as a valid definition, causing negative cache
entries to short-circuit kaikki lookups; update the cache-load branch in
word_page to mirror language_words_hub by assigning definition only when loaded
and loaded.get("definition") is truthy (i.e., change the logic that sets
definition = loaded if loaded else None to require loaded.get("definition")), so
negative entries like {"not_found": true, "ts": ...} are skipped and
lookup_kaikki_native / lookup_kaikki_english still run.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@webapp/app.py`:
- Around line 1418-1431: The code in word_page incorrectly treats any loaded
cache JSON as a valid definition, causing negative cache entries to
short-circuit kaikki lookups; update the cache-load branch in word_page to
mirror language_words_hub by assigning definition only when loaded and
loaded.get("definition") is truthy (i.e., change the logic that sets definition
= loaded if loaded else None to require loaded.get("definition")), so negative
entries like {"not_found": true, "ts": ...} are skipped and lookup_kaikki_native
/ lookup_kaikki_english still run.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99ea41c and 8ee5697.

📒 Files selected for processing (2)

webapp/app.py
webapp/wiktionary.py

coderabbitai · 2026-02-26T14:49:55Z

📝 Walkthrough

Walkthrough

The pull request enhances definition retrieval by implementing a two-step fallback mechanism for missing cached definitions, attempting native language lookups first before falling back to English equivalents. Additionally, five new languages (Palauan, Interlingue, Kinyarwanda, Klingon, Quenya) are registered in the language-name mapping system.

Changes

Cohort / File(s)	Summary
Definition Fallback Logic `webapp/app.py`	Imported `lookup_kaikki_native` and `lookup_kaikki_english` functions; integrated two-step fallback mechanism in word page and language hub routes to retrieve definitions from native language lookups first, then English lookups if cache misses.
Language Registry Expansion `webapp/wiktionary.py`	Added five new language entries to `LLM_LANG_NAMES` constant: pau (Palauan), ie (Interlingue), rw (Kinyarwanda), tlh (Klingon), qya (Quenya).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 Hop, hop, the definitions flow!
From cache to Kaikki, off they go,
Native first, then English too,
And five new tongues we've added new! 🌍✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely describes the main changes: fixing definition coverage gaps by adding fallback lookups for the words hub and adding support for five small languages.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/definition-coverage-gaps

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

webapp/app.py (1)

1418-1431: The fallback logic is correctly duplicated from language_words_hub.

The implementation is consistent with the hub page, which is good.

Consider extracting the shared definition-loading logic (disk cache + kaikki fallbacks) into a helper function to reduce duplication. This would consolidate the pattern used in both language_words_hub and word_page.

♻️ Optional: Extract shared definition loading to a helper

def _load_definition_with_fallback(word, lang_code):
    """Load definition from disk cache, falling back to kaikki pre-built dicts."""
    cache_path = os.path.join(WORD_DEFS_DIR, lang_code, f"{word.lower()}.json")
    if os.path.exists(cache_path):
        try:
            with open(cache_path, "r") as f:
                loaded = json.load(f)
                if loaded and loaded.get("definition"):
                    return loaded
        except Exception:
            pass
    definition = lookup_kaikki_native(word, lang_code)
    if not definition:
        definition = lookup_kaikki_english(word, lang_code)
    return definition

Then use definition = _load_definition_with_fallback(word, lang_code) in both routes.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@webapp/app.py` around lines 1418 - 1431, Extract the duplicated disk-cache +
kaikki fallback logic into a helper function (e.g.
_load_definition_with_fallback(word, lang_code)) that: checks the cache path
under WORD_DEFS_DIR/lang_code/word.lower() and attempts json.load inside a
try/except, returns the loaded dict if valid (or None on error), otherwise calls
lookup_kaikki_native(word, lang_code) and then lookup_kaikki_english(word,
lang_code) as fallbacks and returns the first non-None result; then replace the
duplicated blocks in word_page and language_words_hub with a single call to this
helper to obtain definition.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@webapp/app.py`:
- Around line 1418-1431: Extract the duplicated disk-cache + kaikki fallback
logic into a helper function (e.g. _load_definition_with_fallback(word,
lang_code)) that: checks the cache path under
WORD_DEFS_DIR/lang_code/word.lower() and attempts json.load inside a try/except,
returns the loaded dict if valid (or None on error), otherwise calls
lookup_kaikki_native(word, lang_code) and then lookup_kaikki_english(word,
lang_code) as fallbacks and returns the first non-None result; then replace the
duplicated blocks in word_page and language_words_hub with a single call to this
helper to obtain definition.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99ea41c and 8ee5697.

📒 Files selected for processing (2)

webapp/app.py
webapp/wiktionary.py

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

Hugo0 merged commit 3b3ef53 into main Feb 26, 2026
4 checks passed

coderabbitai bot mentioned this pull request Mar 4, 2026

feat: kaikki fallback + definition code cleanup #138

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: definition coverage gaps in words hub and 5 small languages#135

fix: definition coverage gaps in words hub and 5 small languages#135
Hugo0 merged 1 commit intomainfrom
fix/definition-coverage-gaps

Hugo0 commented Feb 26, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

Hugo0 commented Feb 26, 2026

Uh oh!

coderabbitai bot commented Feb 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot commented Feb 26, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot commented Feb 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Hugo0 commented Feb 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Summary by CodeRabbit

Uh oh!

Hugo0 commented Feb 26, 2026

Uh oh!

coderabbitai bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot commented Feb 26, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Hugo0 commented Feb 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 26, 2026 •

edited

Loading

coderabbitai bot commented Feb 26, 2026 •

edited

Loading